home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Trading on the Edge
/
Trading On The Edge - CD-ROM Toolkit (Wayzata Technology)(2031)(1994).bin
/
pc
/
pc_files
/
mktdata
/
econdata
/
docutils
/
bump.doc
< prev
next >
Wrap
Text File
|
1992-02-10
|
18KB
|
359 lines
BUMP 1.1
A Beginner's Understandable Matrix Package
in Borland C++
From the dawn of programming, people have wanted to be able to write
something like
Matrix A(4,4), B(4,4), C(4,4), D(4,4);
...
A = (B + C)* D;
and get in the A matrix the result expected from matrix algebra. The best
that languages like Fortran, Basic, Pascal, or C could offer, however, was
either a lot of function calls or loops on subscripts. Special matrix
interpreters were developed which, however, limited what you could do. If
what you wanted to do, say graph the diagonal of a matrix, couldn't be done
by your package, you were stuck. Also, you frequently had to master large
manuals for one special-purpose program. With C++, however, it is
possible for the user to define a data type "Matrix" and then write
functions to "overload" the =, +, and * operators so that when the compiler
recognizes that, say, a + is between two matrices, it will call the user-
written function to add the two matrices. At the same time, you have all
the flexibility of C and C++ and very little to learn in the way of
special-purpose instructions. It is just a matter of writing the
necessary code to define the Matrix and Vector data types and to overload
the operators.
Since matrices are so widely used, I had expected that it would be easy to
find examples in books of exactly how to define the data types and write
the overloaded operator functions. I found hints and fragments, but no
fully worked out satisfactory solution. In particular, the examples in the
books tended to be bad about loosing track of memory that had been
allocated for temporary matrices (like (B + C) in the above example), so
that repeated use of equations like the one above would eventually stop the
program. Design of the destructors was of the essence for this problem,
and it gets scant attention in most introductions to C++. On the bulletin
boards, I found two extensive examples at the opposite extreme, Newmat and
Yamp. They were too complicated to serve as examples. I learned some
things from them but could not begin to really follow how they worked. So
I set out on my own to see if I could apply what I thought I knew. It was
a time-consuming process full of mistakes and enigmatic compiler messages.
But the result, BUMP, seems well worth the trouble.
In BUMP you will find applications of most of the C++ techniques. It has
classes with functions, derived classes with inheritance, non-trivial
constructors and destructors, overloaded operators, and even a virtual
function. In the hope that it may be useful to others both as an example
and as a useable product, I have made it publicly available. I hope that
BUMP may help you over the hump to effective use of C++.
BUMP is intentionally a minimal package so that the user can understand it
and extend it in directions of interest to him. Before trying to read the
code, however, it is probably a good idea to use BUMP as it is a bit so
that you know what its functions do before you trying to read them.
Accordingly, I'll first explain how to use it, and then add a few notes on
reading the code.
Use of BUMP
To use BUMP as it is, you write a program in C++. Don't worry if you know
only C and not C++; the code you write can be pure C except for your use of
BUMP's matrix and vector objects. You must compile your program with the
Borland C++ compiler, so the extension of its filename should be ".cpp".
Your program should have among the header lines the line
#include "bump.h";
and the linker command should link in bump.obj. Specific directions for
compiling and linking the example are given below.
Those simple measures are all that is necessary to be able to write a
program like the following, which illustrates some of the capabilities of
BUMP:
Vector a(4),at(1,4),c(4),ct(1,4);
Matrix A(4,4),B(4,4),C(4,4),D(4,1),E(1,1);
// Read in the matrix from an ASCII file and display it
A.ReadA("amatrix");
A.Display("Here is the A matrix:");
// Pre-multiply it by a scalar and divide it by a scalar
B = 2.*A;
B.Display("This is 2.*A.");
B = A/2.;
B.Display("and this is A/2.");
tap();
// Invert it. A ! in front of the matrix is the sign for inverse.
B = !A;
B.Display("and here is B = !A ( A inverse):");
// Multiply two matrices:
C = A*B;
// This next Display will have column width of 15 and 6 decimal places
C.Display("This is A*!A (should be I):",15,6);
// Parentheses work as expected:
B = (A + A)*!A;
B.Display("B = (A+A)*!A. Should be 2I");
C = A*A;
C.Display("A*A:");
C = (A+A)*(A+A);
C.Display("(A+A)*(A+A). Should be 4 times the matrix above.");
// Take the transpose. A ~ in front of a matrix indicates transpose.
B = ~A;
A.Display("Here is A again.");
B.Display("and this is ~A, A transpose.");
// Now some tests with vectors
a.ReadA("avector");
at = ~a;
a.Display("This is the a vector:");
at.Display("and this is its transpose. The difference doesn't show.");
D << a;
D.Display("And this is D << a :");
c = A*a;
c.Display("This is A*a");
ct = ~a*~A;
ct.Display("This is ~a*~A. Should look the same as the above.");
// Tests of the / operator. A/A is A'A computed without taking A'.
B = ~A*A;
B.Display("This is ~A*A");
B = A/A;
B.Display("This is A/A. Should be the same as the above.");
c = A/a;
ct = a/A;
c.Display("This is A/a:");
ct.Display("This is a/A. Should look like the above.");
E = a/a;
B = at/at;
E.Display("This is a/a:");
B.Display("This is at/at:");
printf("\nEnd of test.\n");
}
From this example, we see that BUMP has an extended data type Matrix. A
matrix has a function (ReadA) to read data into it from an ASCII file and
another function (Display) to display the matrix on the screen. The =,
+, -, and * operators have their expected definitions. Parentheses work in
the expected way. A ~ in front of a Matrix calculates its transpose, while
a ! in front of a Matrix calculates its inverse (by a primitive algorithm).
Neither affects the Matrix itself. (Perhaps you don't like ~ for
transposition and ! for inversion. The ' operator, however, is not
available in C++ for overloading; the ^, which I would have preferred for
inversion, is a binary operator in C++ and cannot be used for a unary
process like inversion or transposition. There is, in fact, not much
alternative to ~ and ! for these two unary operations.)
BUMP also has a Vector data type. All the same functions and operators
(except, of course, !) apply to Vectors, and to Vectors and Matrices
together, where dimensions are appropriate.
Pre-multiplication of a Vector or Matrix by a float (a scalar) has been
defined, as has division of a Matrix or Vector by a float. I did not
bother to define post-multiplication of a vector or matrix by a float.
Matrix division does not occur, so the / operator is available to mean
something else. In statistical computations, expressions like X'X or Z'X -
- ~X*X or ~Z*X in BUMP notation -- occur frequently with X' or Z' being
large matrices so that it undesirable to clog up scarce memory by actually
taking the transpose. These products can, of course, be computed directly
from X and Z. That is what Z/X does in BUMP; it computes ~Z*X without ever
creating ~Z.
To convert a Vector, v, to a Matrix, M, one uses
M << v;
Finally, element i of Vector v is denoted by v[i] on either side of the =
sign, while element (i,j) of Matrix M is denoted by M[i][j] on either side
of an =. The Vector elements are range-checked on both sides of an = sign.
Thus, if v has 20 elements and you ask for v[23], you will get an error
message. The range checking on the Matrix element is only on the row index.
(A reference to B[i][j] actually calls the B[i] function in BUMP, which
returns a pointer to the ith row of B. The [j] then functions as
subscripts normally do in C.) You can use these functions to do anything
to your matrices which BUMP does not do for you. For example, if you want
to zero out all the negative elements in the vector x, which has n elements
beginning with 1, then you just do
for(int i = 1; i <= n; i++)
if(x[i] < 0) x[i] = 0;
Thus, you have all the flexibility of C or C++, plus having a nice language
for common matrix operations.
The BUMP package includes an example program, called test.cpp. To build
it, type (at the DOS prompt) "bump test". To run it, type "test". When
you have created your own test, say, "mine.cpp", you would just type "bump
mine" to compile and link the program and "mine" to run it. You will find
that this process is using the bump.bat file, the bump.mak "Make" file, and
bump.res "response" file for the linker.
Perhaps a word of explanation is needed as to why BUMP has the Vector data
type as well as the Matrix data type. The representation of matrices in
BUMP is that used in Numerical Recipes in C, by William H. Press, et al.
(Cambridge University Press). This representation uses a vector of
pointers to the rows of the matrix. Its great advantage is that it allows
arrays of more than 64K bytes to be handled by a 16-bit compiler. It also
gives the user control over whether the subscripts start at 0, 1, or
whatever. It is, however, a very inefficient way to represent a matrix
which happens to be a column vector, so that it needs as many pointers to
rows as it has elements. BUMP therefore has a special data type for
vectors, Vector.
The full form for the declaration of a matrix or vector is
Matrix A(nr,nc,temp,fr,fc);
Vector x(nr,nc,temp,fr,fc);
where nr is the number of rows (no default value);
nc is the number of columns (default of 1 for vectors);
temp is 'y' if the object is temporary, to be destroyed by the
first operator to use it. Objects created by an operator
all have temp set to 'y'. This device is crucial for
throwing away the temporaries that get created in the
evaluation of expressions. Normally, when a user declares
an object, he will want temp = 'n', not temporary, which is
the default value. All objects are deleted if they "go out
of scope", that is, if the program exits from the function
in which they were declared.
fr is the first row; default is 1;
fc is the first column; default is 1.
Examples:
Matrix A(100,5); is a 100 X 5 non-temporary matrix with
first row = 1 and first column = 1.
Matrix A(35,10,'n',60,0); is a 35 X 10 non-temporary matrix with
first row = 60 and first column = 0.
BUMP has a few functions not needed in the examples above. If A is a
Vector or Matrix object, then
A.rows() gives the number of rows.
A.columns() gives the number of columns.
A.firstrow() gives the first row (default = 1).
A.lastrow() gives the last row.
A.firstcolumn() gives the first column (default = 1).
A.lastcolumn() gives the last column.
A.freeh() frees all the heap memory occupied by A. This is
useful when you need A for several operations but
then no longer need it and want to use the space
it occupies for something else. Don't try to use
A after doing A.freeh().
For a Matrix only, we also have
A.invert(int startrow, int endrow)
Invert the matrix and return the value of its determinant as a
double. Note that this operation turns the Matrix A into its
inverse, whereas !A leaves A unchanged. Start the pivot
operations in row startrow and stop when the pivot has been in
endrow. If these arguments are omitted, the pivoting starts in
the first row and continues through the last, to produce the true
inverse. The algorithm is Gauss-Jordan pivoting with no
niceties. Don't trust it if your matrix poses any problems for
inversion.
Study of the BUMP code
To study the C++ techniques used in BUMP, you should begin with the bump.h
file. You will see that both the Vector and Matrix types are derived from
a class called "vorm" -- Vector OR Matrix. Mostly "vorm" just has numbers
of rows and columns and the like. It does, however, contain one function,
freet(), which is exactly the same for both vectors and matrices. This
freet(), however, calls freeh(), which must be different for vectors and
matrices. Hence, freeh() is declared in vorm as a virtual function. The
real versions of freeh() are declared in Vector and Matrix.
The "temp" tag in the declaration of the Matrix and Vector data types is
the crucial element for avoiding loss of heap memory as the program
executes. When an operator creates a matrix or vector in order to have a
place to put what it computes, it sets the "temp" tag of the created object
equal to 'y' to indicate that the object is temporary. As far as I can
see, temporary objects are needed only once. Hence, whenever an operator
uses a temporary object, it releases all of the heap memory allocated to it
and marks it (v = 0 or m = 0) so that the automatically called destructor
will not release that memory a second time. This freeing is done by calls
to freet() in the operator functions. This freet(), declared in vorm,
tests the temporary flag and, if a temporary is found, frees all the heap
memory allocated object by a call to freeh(). (Hence the name "freet":
FREE Temporary, while "freeh" is FREE Heap memory.)
Another crucial point to be noted is described in the long comment in
Vector::Vector(Vector& a) in bump.cpp. By watching the debugger, I learned
that when a function returns an object that was originally declared in that
function and is therefore "local" to the function, it automatically calls,
before it returns, a constructor to copy the object that is being returned.
The original will then be deleted by an automatic call to the destructor.
This process, if left unchecked, will fragment the heap memory. BUMP uses
its temp tag so that the constructor detects what is happening and simply
steals the content of the original local object and sets up a flag so that
the destructor, to which that local object is headed, knows not to delete
that content. Thus, the fragmentation of heap memory is avoided. I found
watching this process with the debugger very helpful.
The other thing that it took a long time for me to get into my head is the
meaning of the much-used operator & in function declarations. In
float& Vector::operator [] (int i)
for example, what is returned is NOT a pointer to a float, but an lvalue
for the float. One of the return statements in this function is
return (v[i-fr]);
where v[i-fr] is just a plain old float. It is the & in the declaration of
the function that turns this float into the lvalue for the float. Now the
magic thing about lvalues is that on the right of an = sign they are just
numbers, but on the left of an = they are the place the number is stored.
Thus, if we have
Vector A(10);
float x;
x = 2.;
then
x = A[6];
will call the float& Vector::operator [] (int i) function to get the value
of the sixth element of A to put into x, while
A[6] = x;
calls exactly the same function to get the address at which to store x so
that it will be the sixth element of A. Such is the magic of the lvalue
and the & operator.
In the storage of matrices, I have used exactly the device from Numerical
Recipes in C. In a Matrix with fr and fc (first row and first column) both
1, m[1][1] will be the very first piece of storage allocated for the
matrix. In that case, m[0][0] would be outside the range of m and
reference to it would be an error. I did not follow this convention for
Vectors, where the first element is always in v[0]. I considered switching
over the Vector convention to the Matrix convention, but found that to do
so would simplify only 6 expressions and would complicate or at least not
simplify 31 expressions. So I did not switch.
* * *
BUMP is public domain. If you find bugs, I would like to know about them.
My own intentions are to leave BUMP pretty much as it is except for bug
fixes. I will be working on integrating it into other programs and will
probably develop a facility for sparse matrices, but this will not go into
BUMP, which needs to be left simple enough that the neophyte -- like me --
can understand it.
Clopper Almon
Department of Economics
University of Maryland
College Park, MD 20742
Internet: CA10@umail.umd.edu
Compuserve: 73377,1466